AITopics | state size

Collaborating Authors

state size

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models

Neural Information Processing SystemsJun-30-2026, 08:10:46 GMT

Modern state-space models (SSMs) often utilize structured transition matrices which enable efficient computation but pose restrictions on the model's expressivity, as measured in terms of the ability to emulate finite-state automata (FSA). While unstructured transition matrices are optimal in terms of expressivity, they come at a prohibitively high compute and memory cost, even for moderate state sizes. We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with provably optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs.

artificial intelligence, proceedings, transition matrix, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.60)

Add feedback

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models

Neural Information Processing SystemsJun-18-2026, 16:27:27 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Neural Information Processing SystemsMar-22-2026, 15:03:30 GMT

Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch.This paper introduces Gated Slot Attention (GSA), which enhances Attention with Bounded-memory-Control (ABC) by incorporating a gating mechanism inspired by Gated Linear Attention (GLA).Essentially, GSA comprises a two-layer GLA linked via $\operatorname{softmax}$, utilizing context-aware memory reading and adaptive forgetting to improve memory capacity while maintaining compact recurrent state size.This design greatly enhances both training and inference efficiency through GLA's hardware-efficient training algorithm and reduced state size.Additionally, retaining the $\operatorname{softmax}$ operation is particularly beneficial in ``finetuning pretrained Transformers to RNNs'' (T2R) settings, reducing the need for extensive training from scratch.Extensive experiments confirm GSA's superior performance in scenarios requiring in-context recall and in T2R settings.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Consumer Health (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ebd9629fc3ae5e9f6611e2ee05a31cef-Supplemental.pdf

Neural Information Processing SystemsFeb-19-2026, 08:35:36 GMT

Dataset (1)consists ofvarious lines in the image at a discrete set of angles, and the classification task is to detect the angle of 14 the line. Some images from the test set ofclasses 80 and 100 are multiplied with apermutation matrix to randomly permute rows and columns.

artificial intelligence, machine learning, rnnpool, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

e6e706454d72c18582b9c1ff70b11f7d-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 17:12:13 GMT

artificial intelligence, machine learning, probability flow, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

d3f39e51f5f634fb16cc3e658f8512b9-Paper-Conference.pdf

Neural Information Processing SystemsNov-20-2025, 04:38:31 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(9 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.92)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Neural Information Processing SystemsOct-10-2025, 17:39:00 GMT

This design greatly enhances both training and inference efficiency through GLA's hardware-efficient training algorithm and reduced

proceedings, state size, transformer, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(9 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.92)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Formulating Discrete Probability Flow Through Optimal Transport

Neural Information Processing SystemsOct-9-2025, 10:19:48 GMT

Continuous diffusion models are commonly acknowledged to display a deterministic probability flow, whereas discrete diffusion models do not. In this paper, we aim to establish the fundamental theory for the probability flow of discrete diffusion models. Specifically, we first prove that the continuous probability flow is the Monge optimal transport map under certain conditions, and also present an equivalent evidence for discrete cases.

diffusion model, initial point, probability flow, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

641d77dd5271fca28764612a028d9c8e-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 02:17:39 GMT

artificial intelligence, cot -gan, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

StateX: Enhancing RNN Recall via Post-training State Expansion

Shen, Xingyu, Chen, Yingfa, Thai, Zhen Leng, Han, Xu, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceSep-29-2025

While Transformer-based models have demonstrated remarkable language modeling performance, their high complexities result in high costs when processing long contexts. In contrast, recurrent neural networks (RNNs) such as linear attention and state space models have gained popularity due to their constant per-token complexities. However, these recurrent models struggle with tasks that require accurate recall of contextual information from long contexts, because all contextual information is compressed into a constant-size recurrent state. Previous works have shown that recall ability is positively correlated with the recurrent state size, yet directly training RNNs with larger recurrent states results in high training costs. In this paper, we introduce StateX, a training pipeline for efficiently expanding the states of pre-trained RNNs through post-training. For two popular classes of RNNs, linear attention and state space models, we design post-training architectural modifications to scale up the state size with no or negligible increase in model parameters. Experiments on models up to 1.3B parameters demonstrate that StateX efficiently enhances the recall and in-context learning ability of RNNs without incurring high post-training costs or compromising other capabilities. Recently, recurrent neural networks (RNNs) such as gated linear attention (GLA) (Y ang et al., 2024) and Mamba2 (Dao & Gu, 2024) have shown promising capabilities in language modeling. These architectures have constant per-token complexity, while the more popular Transformer architecture (V aswani et al., 2023) has per-token complexity that grows linearly with the context length.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.2263

Country: